Biometric Analytics Cost Estimating
نویسنده
چکیده
This paper examines the interplay between biometric technologies and advanced analytics, referred to as biometric analytics, as a way to detect fraudulent entries in a biometric system. We follow a systematic approach, based on cost estimating standards, to ascertain whether deploying such a capability is worthwhile. A simple case study is presented that illustrates the key aspects of our analysis. In addition, there are a series of cost elements that quantify the impact biometric vulnerabilities have on individuals, companies, and countries. Introduction As biometric technologies and systems find their way into more and more facets of daily life, the need for such systems to be reliable and secure has become that much more important. Concurrently, fields such as data science are growing at a dynamic rate and are providing organizations new and powerful ways in which to leverage large quantities of data to drive improvements and transformation. This paper aims to explore the confluence of these two fields, what we refer to as biometric analytics, and begin to estimate the benefits and costs of implementing a biometric analytics capability that looks to address the problem of fraudulent records in a biometric data store. This paper is organized as follows. We begin by presenting a focused background on biometrics and advanced analytics as a way to orient the reader for what is to come and introduce the motivation behind our study. A brief summary of our estimation approach follows. With our approach established, we present an example case study that illustrates the key aspects of the problem. We conclude with some salient points that emerged during our analysis and thoughts for future work. Background In this section, we provide the reader with some relevant background information on the topics of biometric and advanced analytics. Biometrics Biometric technologies measure and analyze human physiological and behavioral characteristics. Identifying an individual’s physiological characteristics is based on direct measurement of a part of the body, e.g., fingertips, hands, face, and eye retinas and irises. Identifying behavioral characteristics is based on information derived from actions, such as speech and how one signs his/her name. Because the characteristics they measure are thought to be distinct to each person, biometrics can be very effective personal identifiers. Unlike more traditional identification methods that rely on something one has, such as an identification card for building access, or something one knows, such as a PIN to access an ATM, biometrics are integral to something about the individual. Being inherently linked to the individual, they are more reliable, cannot be forgotten, and are less easily lost, stolen, or spoofed. While biometric technologies vary in complexity, capabilities, and performance, all share several elements in common. At a fundamental level, all biometric identification systems reduce to pattern recognition systems. They use sensors such as cameras and scanning devices to capture images, recordings, or measurements of an individual’s characteristics along with computer hardware and software to extract, encode, store, and compare these characteristics. Because the process is almost always automated, biometric decision-making is typically very fast, and in some cases, real-time. Depending on the application, biometric systems can be used in one of two modes: verification or identification. Verification, or authentication, is used to verify a person’s identity; i.e., to authenticate that an individual’s reported identity is their true identity. Identification, on the other hand, is used to establish a person’s identity; i.e., to determine who a person is. Although biometric technologies measure different characteristics in substantially different ways, all biometric systems involve similar processes that can be divided into two distinct stages: (1) enrollment and (2) verification or identification. In enrollment, a biometric system is populated with the information needed to identify a specific person. The person first provides an identifier, such as an identification card. He or she then presents the biometric (e.g., fingertips, hand, iris) to a suitable acquisition device, the distinctive features are located, and one or more samples are extracted, encoded, and stored as a reference template for future comparisons. Finally, this biometric is linked to the identity specified on the identifier. In verification systems, the objective is to verify that a person is who he or she claims to be (i.e., the person who enrolled). After the individual provides the identifier that was used during enrollment, the specific biometric is presented. The system captures the biometric and generates a trial template. The system then compares the trial biometric template with this person’s reference template to determine whether the individual’s trial and stored templates match. Verification is often referred to as 1:1 (one-to-one) matching. Verification systems can contain databases ranging from dozens to millions of enrolled templates but are always predicated on matching an individual’s presented biometric against his or her reference template. In identification systems, the objective is to identify who a person is. Unlike verification systems, an identifier is not necessary. To find a match, instead of locating and comparing the person’s reference template against his or her presented biometric, the trial template is compared against the stored reference templates of all individuals enrolled in the system. Identification systems are referred to as 1:N (one-to-N, or one-to-many) matching because an individual’s biometric is compared against multiple biometric templates in the system’s database. Advanced Analytics The field of analytics is as broad as that of biometrics, arguably broader. At its core, analytics is the discovery and communication of meaningful patterns in data, relying on the simultaneous application of statistics and mathematics, computer programming, and data manipulation to extract valuable knowledge from data. While analytics can be as austere as fitting a line to a set of data points, it can also be as complex as developing an artificial neural network to perform speech recognition. In the context of our analysis, we will focus on more complex analytics, rooted in the field of machine learning. Machine learning, as defined rather formally by Mitchell [1], is a framework where, “a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” More intuitively, machine learning is most commonly used to mean the application of induction algorithms and other algorithms that can be said to “learn.” An algorithm is said to be inductive if it takes as input specific instances and produces a model that generalizes beyond these instances. The learning aspect is typically realized through a process called supervised learning, wherein the algorithm is presented with a training data set from which to learn. This training data set consists of example inputs and their desired outputs or “labels.” For instance, the inputs could be physical characteristics of a person, such as height, weight, hair color, and so on; the corresponding labels could then be male or female. The algorithm would use these inputs and the corresponding labels to “learn” a model that mapped inputs to output. In cases where a training data set is not available, one turns to unsupervised learning. Here, no labels are given to the algorithm, leaving it on its own to find structure in the input data. Clustering, or grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other group, is a classical unsupervised machine learning task. Motivation Biometric systems have long been used in law enforcement and forensics, and more recently, has gained prominence as a reliable, cost-effective means of personal authentication. Government and commercial applications include immigration, border control, airport security, physical access control, ATM authentication, and mobile device security [2]. To quickly perform verification of a subject or to perform identification against a watch list, government agencies and other users of these systems maintain large databases of digital biometric records. For example, today, the FBI’s Integrated Automated fingerprint Identification System (IAFIS) contains fingerprint records for over 64 million individuals [3]. Biometric systems are vulnerable to attacks at various stages in the biometric recognition process, including attacks on the database in which enrolled entries are stored. Large biometric databases pose challenges to testing and protecting the integrity of collected data. For example, fingerprint databases may be vulnerable to cyberattacks aimed at impersonating or concealing an individual’s identity through the use of synthetically generated fingerprint images (spoofs). These images could allow the attacker to replace their own fingerprints in the database so that the attacker is not recognized during subsequent identification attempts. Additionally, an attacker can perform a “masquerade attack,” in which they impersonate another individual by injecting a synthetic image that has been reconstructed from the desired individual’s feature set. A number of advanced analytics techniques (e.g., machine learning approaches) have been proposed to address the problem of spoofed biometric detection [4, 5]. The belief is that with the rapid growth of fields such as data science and our ability to mine and exploit massive data stores (such as those associated with biometric records), identification of spoofed records in large databases should now be possible. Furthermore, identification of fraudulent biometric authentications, in near-real-time, should also be practical. In order to transform these possibilities into reality, a biometric analytics framework is needed. The exact details of such a framework will depend on the application. For instance, detecting spoofed biometric records in a database would likely required a form of unsupervised clustering wherein bona fide records were assigned to one group and spoofs, to another. In contrast, detecting a fraudulent authentication could be accomplished using a supervised algorithm that has been trained to recognize legitimate biometric features as different from spoofed features. In any event, the goal of implementing a biometric analytics capability would be to reduce, in an automated fashion, the instances of fraud within a system. Such a reduction would translate to a cost savings, whether it be a readily quantifiable savings (e.g., reducing welfare abuse) or a less tangible cost reduction such as reducing occurrences of illegal entry or access (e.g., illegal entry into the Unites States by someone on a watch list). Clearly, these sorts of benefits do not come without a cost. In this case, it is the cost of developing, implementing, and maintaining a biometric analytics capability. Determining whether or not adopting such a capability is ultimately worthwhile is an important decision that requires a systematic analysis. Our approach for estimating these costs and benefits is outlined in the following section.
منابع مشابه
A deep learning model for estimating story points
Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for...
متن کاملOn estimating performance indices for biometric identification
Pattern Recognition, vol. 42, no. 9, pp. 1803-1815, September, 2009. On Estimating Performance Indices for Biometric Identification Jay Bhatnagar, Ajay Kumar Biometrics Research Laboratory Department of Electrical Engineering, Indian Institute of Technology Delhi Hauz Khas, New Delhi 110 016, INDIA Abstract This paper investigates an information theoretic approach for formulating performance in...
متن کاملBiometric Cards for Indian Population: Role of Mathematical Models in Assisting and Planning
Mathematical models could be helpful in assisting the Indian Government’s new initiative of issuing biometric cards to its citizens. In this note, we look into the role of mathematical models in estimating the missing, non-enumerated population numbers, estimating annual numbers of cards required by age, gender and regions in India. The linkage between National Population Register and biometric...
متن کاملBehavioral and Physical Biometric Characteristics Modeling Used for Its Security Improvement
Biometric technologies rely on specific biometric characteristics that are used for recognition. The particular characteristic for a given situation can be described through a serious of descriptive parameters including ease of collecting, permanence, measurably, acceptability, deceptiveness, universality, uniqueness, sample cost, system cost, database size, as well as environmental factors. By...
متن کاملRuntime Prediction for Scale-Out Data Analytics
Many analytics applications generate mixed workloads, i.e., workloads comprised of analytical tasks with different processing characteristics including data pre-processing, SQL, and iterative machine learning algorithms. Examples of such mixed workloads can be found in web data analysis, social media analysis, and graph analytics, where they are executed repetitively on large input datasets (e....
متن کامل